MEASURE THEORETICAL ENTROPY OF COVERS 



URI SHAPIRA 

Abstract. In this paper we introduce three notions of measure theoretical entropy of 
a measurable cover U in a measure theoretical dynamical system. Two of them were 
already introduced in [R] and the new one is defined only in the ergodic case. We then 
prove that these three notions coincide, thus answering a question posed in [R] and 
recover a variational inequality (proved in [GW]) and a proof of the classical variational 
principle based on a comparison between the entropies of covers and partitions. 



1. Introduction 

In this paper a measure theoretical dynamical system (m.t.d.s) is a four tuple (X, B, //, T), 
where (X, B) is a standard space (i.e isomorphic to [0, 1] with the Borel a — algebra ,/i is 
a probability measure on (X, B) and T is an invertible measure preserving map from X 
to itself. 

A topological dynamical system (t.d.s) is a pair (X, T), where X is a compact metric 
space and T is a homeomorphism from X to itself. 

In [R] the author introduced two notions of measure theoretical entropy of a cover, both 
generalizing the definition of measure theoretical entropy of a partition and influenced by 
[BGH]. Namely, 

(!) h+(U) = inf ahU h^(a) 

(2) h-{U) =lim±inf ahur iH^a) 

It was shown there among other things that h~(U) < h+(U) and that in the topological 
case (i.e a t.d.s and an open cover), one can always find an invariant measure /i such that 
h~(U) = h top (U). This generalizes the result from [BGH] asserting that in the topological 
case one can always find an invariant measure /i such that h~t(l4) > h top (U) 

The question whether h~{U) = h+(U) arose. In [HMRY] the authors continued the re- 
search on these concepts and proved, among other results, with aid of the Jewett-Krieger 
theorem, that if there exists a t.d.s, an invariant measure \x and an open cover U such 
that h~(U) < h+(U) then one can find such a situation in a uniquely ergodic t.d.s. 
Recently, B.Weiss and E.Glasner [GW] showed that if (X, T) is a t.d.s and U is any cover, 
then for any invariant measure \i h+(U) < h top (U) and so combining these results one 
concludes that for a t.d.s and an open cover we have that h~(U) = h+(U). 
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The measure theoretical entropy of a partition a in an ergodic m.t.d.s can be defined 
as: Um-logJ\f(aQ~ 1 , e), where < e < 1 and A/"(ao _1 , e) is the minimum number of atoms 
of «q _1 needed to cover X up to a set of measure, less than e. (See [Ru]). 
In this paper we follow this line and in section 4 define a notion of measure theoreti- 
cal entropy for a cover U of an ergodic m.t.d.s as h e (U) = UmMogN \Uq^ ", e) (where 
< e < 1). We prove (Theorem 4.2) the existence of the limit and its Independence of e, 
in a different way from [Ru] using Strong Rohlin Towers. This can serve as an alternative 
proof of the fact that the above definition of measure theoretical entropy of a partition in 
an ergodic m.t.d.s is well defined. 

We show in a direct way that in the ergodic case the three notions: h~{U), h+iU), h e ^(U), 
coincide (Theorems 4.4, 4.5), and from the ergodic decomposition for h~(U),h+(U), 
proved in [HMRY], we deduce that h~(U) = ht(ll) in the general case (Corollary 5.2), 
and so, we can denote this number by /i M (W,T) or h^{U). 

We also get an immediate proof of a slight generalization of the inequality h^(U) < h top {U), 
mentioned earlier, from [GW], to the non topological case (Theorem 6.1). 

Acknowledgements : This paper was written as an M.Sc thesis at the Hebrew Uni- 
versity of Jerusalem under the supervision of prof Benjamin Weiss. I would like to thank 
prof Weiss, for introducing me to the subject and for sharing with me his and Eli Glasner's 
valuable ideas. 

2. Preliminaries 

Recall that in the following a measure theoretical dynamical system, (m.t.d.s), is a four 
tuple (X, £>, /x, T), where (X, B) is a standard space, fx is a probability measure on (X, B) 
and T is an invertible measure preserving transformation of X. 

2.1. Definition. 

• A cover of X is a finite collection of measurable sets that cover X. 

• The collection of covers of X will be denoted by Cx 

• A partition of X is a cover of X whose elements are mutually disjoint. 

• The collection of partitions of X will be denoted by Vx- 
Usually we denote covers by U, V and partitions by a, (3, 7 etc. 

• We say that a cover U is finer than V (U >z V) if any element of U is contained in 
an element of V. 

• For any U G Cx and k e Z we denote by T k (U) the cover whose elements are the 
sets of the form T k (U) where U eU. 

• We define the join, U V V, of two covers U, V, to be the cover whose elements are 
sets of the form U H V where U &U and V £ V. 

• When the transformation T is understood we denote, for / > k, the cover T~ k (U) V 

T-( fe+1 )(w)--- vr ! (w), hju l k . 
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2.2. Definition. For < 5 < 1 define H(5) = -5log5 - (1 - S)log(l - 5). Note that 
lims^oH (5) = 0. 

fn the sequel, we will prove some combinatorial lemmas and often we will encounter 
the expression ^2 j<SK . We shall make use of the next elementary lemma: 

2.3. Lemma, (lemma 1.5.4 ™ [Shi]): If 5 < \ then J2j< 5 K {f} < 2 ^ (<5) - 

2.4. Definition. A m.t.d.s (X,B, /i,T) is said to be aperiodic, if for every n G N, 
fi{{x\T n x = x}) = 0. 

An ergodic system which is not aperiodic is easily seen to be a cyclic permutation on a 
finite number of atoms. 

One of our main tools in practice, will be the Strong Rohlin Lemma ([Sh2] p. 15): 

2.5. Lemma. Let (X, B, /i, T) be an ergodic, aperiodic system and let a G Vx- Then for 
any 5 > and n GN, one can find a set B e B, such that B, TB . . . , T n ~ l B are mutually 
disjoint, /i(Uo _1 T l B) > 1 — 5 and the distribution of a is the same as the distribution of 
the partition q\b that a induces on B. 

The data (n, 5, B, a) will be called, a strong Rohlin tower of height n and error 5 with 
respect to a and with B as a base. 

3. Measure theoretical entropy of covers 

Let (X, B, n, T) be a m.t.d.s. The definitions and proofs in this section were introduced 
in [R]. 

3.1. Definition, for IA G Cx we define the entropy of U as: 
H^U) = inf atu H^(a). 

3.2. Proposition. 

(1) lfU,V G C x then H^U V V) < H^U) + H„(V). 

(2) For every UeC x H^T^U) = H^U) 

3.3. Corollary. IflA G Cx then the sequence H^IIAq' 1 ) is sub-additive. 

3.4. Corollary. IflA G Cx then the sequence ^H^(Uq~ 1 ) converges to inf n ^H^{U^ 1 ). 

Two ways of generalizing the definition of measure theoretical entropy of a partition to 
a cover are: 

3.5. Definition. If W G Cx-, define 

(1) h-{U,T) = lim±H ll (US- 1 ). 

(2) h+(U,T)=mt ahU h ll (a,T). 

When T is understood we usually omit it and write h~(U), ht{U). 
We shall see later that in fact h~{U) = h+(U). 
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3.6. Proposition. 

(1) h-{U)<h+{U). 

(2) for any men h~(U,T) = ±h~{U^-\T m ) 

(3) h-{U,T)=Um n lh+{Ur\T n ) 

4. The ergodic case 

Throughout this section, (X, B, /i, T), is an ergodic m.t.d.s. 
For U G Cx, we denote by Af(U,e, fi), the minimum number of elements of U, needed 
to cover all of X, up to a set of measure, less than e. When \i is understood we write 
Af(U,e). 

By a strait forward calculation one deduces from [Shi] p. 51 the following: 

4.1. Theorem. If (X, B, ji, T) is an ergodic m.t.d.s and a G Vx, then for any < e < 1, 
h^{a,T) = Um±logN(a™-\e). 

In view of this result, a natural way to generalize the definition of measure theoretical 
entropy of a partition to covers will be the following: 

h^{U,T) = Um-logU{U^\e). 

Where < e < 1. In order to do so we have to show that the above limit exists and is 
independent of e. 

4.2. Theorem. For any < e < 1, the sequence jlogN (Uq ~ 1 > e ) converges and the limit 
is independent of e. 

In order to prove this theorem we shall need a combinatorial lemma. Let us first 
introduce some terminology (in first reading the reader may skip the following discussion 
and turn to the discussion held after the proof of Lemma 4.3): 

• We say that two intervals in N, /, J are separated if there is n G N such that for 
any i G /, j G J we have i < n < j or j < n < i. 

• We say that a collection {/j}j e A of intervals in N is a separated collection if any 
two of its elements are separated. 

• We say that a collection {Ii}i^A of subintervals of an interval \\,K\ is a (A, e) 
separated cover of [1, K] (for < A < 1, < e), if it is separated and 

I— -A|<e. 

— * 

• Given a vector A = (Ai . . . A;), we denote 

i 

"r(X) = II( 1 - 

j=r 
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— * 

or just v r when A is understood. For r > I we set v T — 1. Note that for j < I we 
have: 

^ ^ Aj-Z/j..)-! = 1 — Vj. 
r=j+l 

In the following combinatorial lemma, we will be given / separated collections {If}^^ 
j = 1.../ of subintervals of a very long interval The knowledge about these 

collections is that the members of the j'th collection all have the same length, Nj, N\ << 
N 2 • • • « Ni and every collection is very "equally distributed" in [1, K] in some sense. 
We would like to extract, from these collections, a separated collection that will cover as 
much as we can, from [1, K\. 

Let us denote by Xj, the percentage of [l,if], that is covered by the j'th collection and 

— * 

by A, the corresponding vector. Then, A; = \ — v% percent of [1,K] is covered by {/'}• 
The complement is of size Kvi and we could cover A/_i percent of it with the {/ 4 - _1 }'s. 
By now we covered K(l — and we could cover X^ 2 percent of the complement by 

the {/-~ 2 }'s. So by now we covered K(l — vi- 2 ) of [1, K\. We go on this way and extract 
a separated collection that covers \ — v\ percent of [1, -ff] . Let us now make these ideas 
precise. 

4.3. Lemma. For any I > 0, there exists a positive function ip = <p(Ni . . . Ni, rji . . . rji, e) 
(where < N 2 ■ ■ ■ < N t e N, rji, e > 0) such that 

lim sup lim sup lim sup . . . lim sup lim sup (p(Ni, r)i,e) = 0. (*) 

and such that ifO < Xj < 1 j — 1 . . .1 and {Ij}ieAj ar & separated collections of subintervals 
of [1, K] that satisfy: 

(a) For every 1 < j < I \ If \ — Nj . 

(b) For every 1 < j '< I {If} is a (Xj, t)-separated cover of [1, K]. 

(c) For every < j < r < I, the number of subintervals, J, of [1,K], of length N r , 
which are not (Xj,e) -separately covered by {If C J} is less than rj r K. 

then there are sets Aj C Aj j — 1 . . .1, such that {{If} ie A Yj=i ^ s a separated collection 
and [l,K] is ((1 - ^i(A)), ip(Ni,r)i,e)) -separately covered by {{If} i£ ^} l j=1 . 

Proof. We will build the A,'s by recursion, starting with j = I. Define A x — A x . Then 

from (b) we have that | — A;| < e. So if we will define fi(N i: rji,e) = e, then f 
satisfies (*) and [I, K] is ( A; v i + 1, fi(N i: rj i: e)) -separately covered by {lj} ie ^ - Now, sup- 
pose we have defined A\ . . . Aj + i and positive functions fi . . . fj+i, that satisfy (*), such 
that {{Ii}i(zA r Yr=j+ii 1S a separated collection and for every j + 1 < r < I, [1,K] is 
(X r v r+ i, f r (Ni, rji, e))-separately covered by {I[} ie A ■ Define now, 

Aj = {i G Aj\ If is separated from {I r s } seA ~ r , r = j + 1 . . . /}. 
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We want to estimate the size of Aj. 

Estimation from below: Choose j + 1 < r < I and divide the members of to 
good ones and bad ones according to (c), i.e, I r s is good if it is (Xj, e)-separately covered 
by {If C /J}. We have at most i] r K, J^'s, which are bad and at most \A r \, ij' s, which 
are good. Every bad I\ rules out at most jf- + 2 Vs in Aj from being in Aj. Every good 

I'l rules out at most + e) + 2, i's in Aj from being in Aj. In total, the maximum 

number of i's in Aj that are not in Aj is at most: 

E |A r |(^(A i + c)+2)+^(^ + 2) = (**) 

r=j+l ^ ^ 

Note that because [1,-K] is (A r z/ r+ i, / r )-separately covered by {iJjieA-' we must have 

K 

\A r \ < — (A r z/ r+1 + / r ). 



Using this we get: 



iVj. 



E ^ r (Wi + /,)(^(A, + 6) + 2)+^(^ + 2) 

r=j+l r J J 

= E ^ A ^r+l(Aj + £ ) + Jf.^i + + ^-( A ^r+1 + /r) + ^-VrN r + 2r) r K 

f=j-\-\ 3 3 r 3 

K 1 

= jyT^J'( E ^ Z/ '"+ 1 ) 
J r=j+l 

+ |- E i eX r"r + i + (Aj + e)f r + 2^(X r u r+1 + f r ) + 7] r (N r + 2Nj)} = (K) 

3 r= j+i r 

as mentioned earlier Y^j+i K v r+i = 1 _ v j so we have that: 



|i,|>|^|-(K)>^(A,- e )-(N) 



= W-{ XjUj " { e + i tX r V r+l + (A, + e)f r + 2^(X r U r+1 + f r ) + T] r (N r + 27V,)}} j 

note that 

' TV- 
|(e + E i^r+i + (A, + e)f r + 2j^(X r u r+1 + f r ) + Vr (N r + 2Nj)}\ 

r=j+l r 

1 N 
< e + E < e + + + 2 at( 1 + /•■) + »*W + 2iV ^ 



r=j+l 



r 
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so if we will denote the last expression by fj(N i: rj i: e), then we see that fj satisfies (*) 
and \Aj\ > j$:(X j v j+1 - fj)- 

Estimation from above: For every j + 1 < r < I, we have that \A r \ > j^-(X r i> r+ i — f r ) 
and the number of bad ij's is at most r) r K, so we must have at least jj-(X r v r+ i — f r )—r) r K 
good Ifs. Every good ij, rules out at least jf;(Xj — e) i's in A,- from being in Aj. So the 
number of i's in Aj that are not in A,- is at least: 

^ ^"( A i ~ e ){|)-( A ^+i - /r) - ^r^} 

r=j+l j r 

and so 

\Aj\ < \Aj\ - jf( X 3 - e ){^( A r^r+l - fr) ~ VrK} 

r=j+l 3 r 

K r X / \ X 



- AT^ Aj + ^ ~ ? {^(/MV'r+i - /»■) - e(A r z/ r+ i - / r )J - —r) r N r (\j - e)| 
= iV~{^( 1_ 5 A r^r+i)+e+ ^A i / r + e(A r z/ r+ i-/ r )+r7 r A?" r (A i -e))| 



' r=j+l r=j+l 

i 

K 



<—{\ j u j+1 + e+ (/r + e(l + / r )+r ?r iV r (l + e ))} 

3 r=j+l 

so if we will denote 

i 

f J (N t , Vt ,e)=e+ (/r + e(l + / r )+r ?r iV r (l + e ))} 

r=j+l 

then fj satisfies (*) and \Aj\ < j^-(Xji>j + i + f^j. Define fj = max(fj,fj) and then we 
have that fj satisfies (*) and 

I 3 K 3 ~ < fj- 

We have defined Aj C Aj and a positive function fj, that satisfies (*), such that {{ij} i e A r Yr=j 
is a separated collection and [1,-K] is (Xji/j + i, /^-separately covered by {i/}^.- 
We continue this way and define sets Aj C Aj and positive functions fj,j — l...l, such 
that {{Ii} i& A Yj=ii * s a separated collection and [1,K] is (Aji^+i, /^-separately covered 

I- {//},.,. 

Note that this means: 

i j i ii 

K (^2 X jVj+l - ^2fr) < I (J U - K (5Z A J^'+ 1 + 
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and so, if we will define <p = Y1 fji then <p satisfies (*) and {{^}ieA,}j=i> is a (1 — ui, <p)- 
separated cover of [1, if]. □ 

Before turning to the proof of theorem 4.2, let us present some terminology. In the 
following U = {Ui . . .Um}, is a cover of X. For any p > 0, we can find a partition 
/3 y U, such that Af(U, p) = Af((3, p). Namely, we choose a subset of U, of N = Af(U, p) 
elements, that covers X up to a set of measure < p, {Un . . . U^} and define C\ = Un, 
Cj = Uij \ Ul=i U im , j = 2 ... N. The C/s are disjoint, C j C U tj and (jf Q = (jf=i % 
Extend the collection {Cj}^ to a partition, /3, refining W, in some way. Then, because 
(3 y_U, we have jV(/3, p) > N and from our construction, it follows that Af(f3, p) < N. 

• We call such a partition, a p-good partition for U. 

If (X, £>, /i, T) is aperiodic and N G N, p, <5 > are given, then for a p-good partition /3, 
for Z^- 1 , we can construct a strong Rohlin tower with height N + 1 and error < <5. Let 
.B denote the base of the tower and let B C 5 be a union of jV(/3,p) atoms of that 
covers B up to a set of measure, less than pp(B). 

• We call ((3, B,B), a good base for (U, N, p, 5). 

• For a set J C N, a (ZY, J)-name, is a function / : J — > {1 . . . M}. 

• / is a name of x G X, if x G f]j(zjT~^Uf^y 

• We denote the set of elements of X with / as a name by Sf. 

• A set of (U, J)-names, {/«}, covers a set C G £>, if C C (Ji 'S'/i- 

In the sequel, we will want to estimate the number of elements of Uq" 1 , needed to cover 
a set C G £>, i.e, we will want to estimate the number of (U, [0, N — l])-names needed to 
cover C. The usual way to do so is to find a collection of disjoint sets Jj C [0, N — 1] 
% — 1 . . . m, that covers most of [0, N — 1], such that we can bound the number of (U, Jj)- 
names needed to cover C. If we can cover C by i?j, (W, Jj)-names, {/m}m=n then the set 
r = {/ : [0, N - 1] - {1 . . . M}\ f\j i G {/4}^=i}, of (U, [0, X - l])-names, covers C 
and contains Yl R% • M N ~^ ' J ^ elements. 

This situation occurs in our proofs in the following way: Let ((3,B,B), be a good base 
for (U, N, p,5) and K » N. Set C to be the set of elements of X that visits B at 
times ii < • • ■ < i m between to K — N (under the action of T). Then we can cover C 
by no more than Af((3,p), (U, + N — l])-names. We can now turn to the proof of 
theorem 4.2. 

Proof, (theorem 4.2): If (X,B,p,T) is periodic, it follows from the ergodicity, that the 
system is a cyclic permutation on a finite set of atoms and for every < e < 1 we 
have UmMogN^Q^ 1 , e) = 0. We assume, then, that the system is aperiodic and thus 
we are able to use the Strong Rohlin Lemma. Given < p 2 < pi < 1, we need to 
show that the limits: UmMogM {Uq~ x ^ p;) % — 1, 2, exist and are equal. Note that for 
every n, we have that A^(W n_1 ,pi) < A^(W n_1 ,p 2 ) and thus UmsupHogN \U^~ l , pi) < 
liminfHogN(U r Q~ 1 ,p2), so it's enough to prove that 

limsup—logJ\f(UQ~ l , p 2 ) < liminj ' —logN{UQ~ l , pi). 
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Let < e < \, be given and denote: 

h = UmmfHogN{Ur\pi), L = {n G N| |/i - HogN{U^\ Pl )\ < e }, 
so L contains arbitrarily large numbers. Choose t G N, large enough so that 

(1(1 + ft)) f (o# < e , (1(1 + + < 1 (*)■ 

The towers construction: Remember the function tp from the combinatorial lemma 
(Lemma 4.3). It satisfies: 

lim sup lim sup lim sup . . . lim sup lim sup <p(Ni, rji, e) = 

so we can choose e > 0, small enough, such that 

lim sup lim sup . . . lim sup lim sup <p(Ni, r)i,e) < e . 

N 1 —+oo ?yi — >0 N e —+oo %— *0 

Choose a small enough 8 > (in a manner specified later). Choose Ni G L, large enough, 
such that 

lim sup . . . lim sup lim sup <p(Ni, r)i,e) < e . 

Find a good base (p 1: B 1: B x ), for (U, N x , p±,5). Choose % > 0, small enough, such that 
lim sup lim sup . . . lim sup lim sup <p(Ni, r)i,e) < e . 

From the ergodicity, we can choose N 2 G L, large enough, such that 

• limsup^o. . .limsup A r £ _ >oo limsup % _ >0 v?(^,^,e) < e . 

• »{x I \w 2 Ef=o Nl X Bl (T r x) - h(Bt)\ <^-}>l-r ]l . 

Find a good base, (/3 2 , B 2 , B 2 ), for (U, N 2 , pi, 5). Choose r] 2 > 0, small enough, such that 
lim sup lim sup . . . lim sup lim sup (p(Ni, r)i,e) < e . 

iV3— >oo Vi^O N e —>oo %— *0 

Again, from the ergodicity, we can choose N 3 G L, such that 

• limsup^o . . . lim sup limsup % ^ ip(N u 7^, e) < e . 

• l*{x I 14 Ef=o N > XBj (T r x) - vL(Bj)\ < f: j = 1, 2} > 1 - m . 

In this way we construct, inductively, Ni < N 2 ■ ■ ■ < N e (all from L), rji . . and good 
bases {(3j, Bj, Bj), for (U, Nj, pi,<5), such that ip(N i) 7] i) e) < e and if we denote 

Nj-Ni 

F i = i x \\jT. E XB i (T r x)-p(B i )\<^i = l...j-l} 

3 r=0 



then, p(Fj) > 1 — rjj. 
Define 



K-N, 



E 



K 



i x \^ E XF J (T r x)>l-r ]j , |1 XB J {T r x)-p{B J )\<j r j = l...£}. 

r=0 r=0 3 
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From the ergodicity, we know that there is a K , such that, for any K > K , we have 
n(E K ) > p 2 . Fix K > K , we shall show that we can cover E K) by "few" (U, [0, K — 1])- 
names. For a fixed x G Ek denote 

Aj = {0 < m < K - Nj \T m x G Bj} 

and for every % G Aj, let If = + Nj — l].We claim that the collections {/f} ie ^ 
j = !...£, satisfies conditions (a), (6), (c) from the combinatorial lemma (lemma 4.3), 
with Aj = Njfi(Bj). To see this, note first, that because the height of the j'th tower was 
Nj + 1, we have that each collection {If} ie Aj, is separated. 

(a) By definition \If\ = Nj. 

(b) because x G E k , we know that \ ^2^ =0 Nj XBj(T r x) — n(Bj)\ < ^- and thus, | N3 ^ 3 — 
\j\ < e. So the {If} ie Aj forms a (Aj, e)-separated cover of [0, K — 1]. 

(c) For 1 < r < £, we know from the fact that x G E K , that Z^fL'o^ Xf t (T s x) > 1 — r] r 
and thus we have Z^fL'o^ Xf^{T s x) < r\ r . If we use the definition of F r , this becomes 



-^#{0< S <K-iV r |31<j<r-l|-l £ XBj{ T i+s x)-v(Bj)\>^-}<r) r 

r i=n 3 

or equivalent ly 



TV- 

#{0 < s < K - iV r | 3 1 < j < r - 1 | i + s G Aj} - Aj| > e} < r^if 



so if we choose 1 < j < r < £, we must have 

#{J C [0, X - 1] | |J| = iV r , | // C J} - Ajl > e} < Vr K. 

In words, the number of subintervals of [0, K — 1] of length JV r , J, which are not (Aj, e)- 
separately covered, by those If which are contained in J is less than r) r K, as we wanted. 
Using the combinatorial lemma, we can choose for every x G E^ a separated collection 
{{If {%)} i^AjYj^ that covers at least K(l — vi(\) — eo) elements of [0,K — 1]. Because 
these collections are separated, there is a 1 — 1 correspondence between them and their 
complements. Hence, the number of such covers is less than 



m,^e )= Yl (*) (**) 



Fix such a collection {{If} ie A~ Yj=i an d se ^ 

C = {xeE K | {If{x)} = {//}}. 

From the construction we see that for every 1 < j < £ we can cover Bj by no more 
than 2 Nj ( ho+e °^ (U, [0,Nj — l])-names, thus we can cover C by no more than 2 N ^ ht)+e °^ 
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(U, If )-names. So the number of (U, [0, K — l])-names, needed to cover C is at most 

t 

Y^{2 N ^ ho+e ^)^\ • M K( " 1+eD ' = 2 ( ^ iVjl ^ l)(fto+eo) • M K{vi+eo) 
3=1 

<- 2K(h +e ) _ ^-^(i/i+eo) 

Finally we get from this and (**) that 

N{Ug-\ p 2 ) < ip(K, Xj, e ) • 2 K{ho+eo) ■ M K ^ 1+eo) 

and so 

-^logAf(U^-\ p 2 ) < ^logij(K, Xj, e ) + /i + e + nZo^M + e Zo#M. 



If, in the construction of the towers, we choose S small enough and Ni large enough, we 
3an ensure that A 
[*) we have that 



can ensure that Xj = Njp(Bj) > ^—p- and thus 1 — X 3 , < ^p- => v x < (^p-) e an d so > from 



v x logM < e v x + e < ^ 

hence, from lemma 2.3 

^,A,,e )<2^«^+^ 

hence 

^logU{U^-\p 2 ) <h + e (2 + /o^M) + #((^^ + e ) => 

limsup hogN{U«-\p 2 ) < h + e (2 + %M) + if((I±^)' + e ) 
letting t — > oo and e — > we get 

limsup ^-logAf(Uo~ 1 ,p 2 ) < h 
k A 

as desired. 



□ 



After proving theorem 4.2, we can define, for an ergodic m.t.d.s, (X, B, p, T) and a 
cover U = {U\ . . . Um} of X, a notion of measure theoretical entropy in the following way: 

hl(U, T) = Um-logNiU^' 1 , e) where < e < 1. 

Often we omit T and write h e ^(U). 
A A. Theorem. h^U) = h+(U) 



12 



URI SHAPIRA 



Proof. As before, if the system is periodic then h e (U) = h+(U) = 0. We assume, then 
,that the system is aperiodic. For every partition a >z U, nGN and < e < 1, we have 
that Af(U£-\ e) < WK" 1 , e) and therefore 

h^U) = Um-logM{U^-\e) < Um-logAf{a%-\e) = h^a) 

^h;(u)<h;(u) 

To prove the other inequality, we shall show that for a given < e < | and n G N we 
have: 

K(U) < -logN{Ur\ e) + • logM + H(yfe). (*) 

Once we prove (*), we are done, for letting n — > oo we get h+(U) < h e (U) + ^fk ■ logM + 
H(y/e) and now, letting e — > we get h+(U) < h^iU) as desired. 

Proof of (*): choose 5 > 0, such that e+5 < \ and find a good base ((3, B, B) for (W, n, e, <5). 
(Now we take B to be a base for a strong Rohlin tower of height iV and error < 5 and 
not of height iV + 1 as before). Set N = A/"(W n_1 , e), so B is the union of N elements of 
We index these elements by sequences io . . . i n -i, such that if Sj ...j n _ 1 is one, then 
T^B^i^) C [/;., for every < j < n - 1. We have that fi(X \ [j^T^B)) <e + 5. 
Let a = {Ai . . . A M } be the partition of 

n-l 

E = \JT\B) 
o 

defined by 

A m = U{^(^o...^-J \J e [0,n- 1]. i,- = m}. 

Note that A m C C/ m , for every 1 < m < M. Extend a, to a partition, a, of X, refining 
U, in some way. Set if = e + 5 and define for every k > n f k (x) = \ J^t^ XE(T j x). We 
have that < f k < 1 and / f k > 1 — r/ 2 , so if we will denote: 

G k = {x\ f k (x) > 1 - 77} 

then, 

r/-M^)< / !-/*< / l~fk<V 2 
=>• M^fc) > 1 - »7- 

We shall show that we can cover G^, by "few" (a, [0, fc — l])-names. Partition G k according 
to the values of < i < k — n, such that T l x G _B. Note that if x G G k and < «i < 
• • • < i m < k — n, are the times in which x visits B, then the collection {[ij, ij + n — l]}j=i 
covers all but at most 7]k + 2n elements of [0, k — 1]. Because each element of this partition 
defines a collection of subintervals of [0, k — 1], of length n, that covers all but at most 
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i]k + 2n, elements of [0, k — 1], in a 1 — 1 manner, we have that the number of elements 
in the partition of Gk is at most 



i>(k,n,r})= Y {■) 



We fix an element C of this partition of Gk and want to estimate the number of (a, [0, k — 
l])-names, needed to cover it. If < i\ < ■ ■ ■ < i m < k — n are the times elements of 
C visit B, then we need at most N, (a, + n — l])-names, to cover C. Because the 
size of [0, k — 1] \ \Jj[ij,ij + n — 1], is at most r/k + 2n, we need at most Nn ■ M vk+2n 
(a, [0, k — l])-names, to cover C. Finally, we have that we can cover Gk, by no more than: 

ip(k, n, rf) ■ N™ ■ M^" 1 " 2 ™ 
(a, [0, k — l])-names. Because ii{Gk) > 1 — i], this means that: 

1 1 1 2n 

-logXian' 1 , rf) < -logM, n, rf) + -logN + (r) + —)logM. 
k k n k 

Recall that once (r? + ?f) < |, we have ip(k,n,r)) < 2 k ' H( - v+ ^ and so 

hf,(a) = UmhogAfta^ 1 , rf) < -logAfiU'^ 1 , e) + rj ■ logM + H(r)) 



so 



h+(U) < -logN(U%-\ e) + VTTS ■ logM + H(V7T~5) 



Letting 5 — > we get 



h+(U) < -logAfiU^ 1 , e) + • logM + H{y/e) 



as desired. 



□ 

4.5. Theorem. h+(U) = h~(U) 

We already know that h+(U) > h~(U) (Proposition 3.6), so we only need to prove the 
other inequality. Before we turn to the proof, let us present some terminology and prove 
a combinatorial lemma. 

Let A, be a finite alphabet of M letters, k,n G N k >> n, < 5 < 1 and uj = uJq~ 1 , a 
word of length k on A. (The symbol a s r stands for a r . . . a s ). Denote T = A n . 

• An (n, k, <5)-packing is a pair C = 7™ _1 ) where < ij < k — n, G T, j = 
0...m — 1, ij + n — 1 < ij + i and ^ > 1 — 5. (We think of an (n, k, <5)-packing 
as instructions to "almost" write a word of length k, we just fill it with the 7/s, 
where 7, starts in the ij letter and there will be no more than Sk letters to add.) 

• An (n, k, <5)-packing for to, is an (n, k, <5)-packing, C = (i™ -1 , 7™ _1 ), such that 

i-i+n— 1 



14 URI SHAPIRA 

• if /Xi,/x 2 are probability distributions on T then 

1 1 A*x — A*s| | = max |//i(7) - ^2(7) I- 

7 

• An (n, k, 5)-packing, C = (^o 1-1 , 7™ -1 ), induces a probability distribution on T, 
denoted by Pe, by the formula Pc{l) — —#{0 < j < ni — 1 | 7 = 7j}- 

• If /x is a probability distribution on V and C is an (n, /c, <5)-packing, then we say 
that C is (n, k,S, /x), if | |/x — Pe| | < We say that a; is (n, fc, 5, //), if there is an 
(n, k, <5)-packing for u, which is (n, k, 5, //). 

4.6. Lemma. If /i is a probability distribution on T, with "average entropy" 

ho = --$^M7)^#M7) 

then there exists a positive function (p(5), such that tp(5) — > as 5 — > and sxtc/i £nai i/ 
< 5 < |, inen /or any k > n, the number of words uj G A fe 7 which are (n, /c, <5, /x) 7 is ai 
most 2 k{ho+ ^ . 

Proof. Fix k > n. We want to estimate the number of words uj = uJq 1 G A fc , that are 
(n,k,8,ii). For every such word, u, we can choose an (n, k, o")-packing, C = (i™ -1 , 1™" 1 ) 
which is (n, k, 5, /x). In this way we define a map 

it : {uj £ A k \ uj is (n, fc, 5, /x)} — > {C | C is an (n, A;, 5, /x) — packing} 

If C = (C^TcT 1 ), is an (n, M)-packing, then »jp > l - <J. This means that Iti-^C)! < 

| A |5fc = M 5fc_ g we have that 

e A h \u is (n, fc, 0", /x)} < M 5fc #{C | C is an (n, k, 5, /x) - packing}. 

Let us now estimate the number of (n, k, 5, /x)-packings, C = (i n ~ 1 , 70 1-1 ): 
The number of sequences, i™^ 1 , such that <ij < k — n,ij+n — l < ij + i and > 1 — 5 
is at most >~2j <Sk (J). From lemma 2.3 we know that for 5 < |, this sums to something 
< 2 H ( 5 ) k . 

Fix such a sequence i™ -1 . Let us now estimate the number of sequences, 7™ _1 , such that 

the (n, k, <5)-packing, C = (i™ _1 ,7™ _1 ), is (n,k,5,fi). 

Denote v = ®j"/x, the product measure on r m . If 7™ -1 G T" 1 , then 

Z/(7™ _1 ) = /x(7) #{ °- J - m_1 1 7=7j} = 2 E ^M^°> #{°^'^ m - 1 I 7=7 3 }-WM(7) 

7er 

— 2 m S{ 7 | M ( 7 )^o} ^#{0<i< m ~! I 7=7jH°9M(7)^ 

Now, the function / : {(a; 7 ) 7e r G M r | ^^7 = 1} ~~ * ^> defined by 

/(f 7 ) = ^ x 7 . /oo/x(7) 
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is continuous and so there is a positive function if) (6), such that ip(5) — > as 5 — > and if 
max 7 |x T — A*(t) I < ^> then \f(x 7 ) — f(fJ.(j)) \ < ip(S) (note that ip depends only on n, //). 
So if 7™" 1 G T m is such that C = (i™' 1 ,^™ -1 ), is a (n, fc, 5, /i)-packing, it follows that 

> 2 m (^{7lM(7)^0} ^(7)«OS/i(7)-V'(5)) > 2 fc (-^0-^) 

Where the last inequality follows from the fact that m < - and the definition of ho. We 

conclude that an upper bound for the number of such sequences 7™ _1 is 2 k( - ho+ ^~\ If we 
collect these estimations, we get to the conclusion that for < 5 < | 

eA k \uis (n, k, 8,fi)}< M 5k ■ 2 H ^ k ■ 2 k ^ + ^ < 2 Kh«+^+H{5) + 8.io 9 M) 

so <f (5) = + H(5) + 5 ■ logM is our desired function. 

□ 

Proof, (of theorem 4.5): We want to show that for an ergodic system (X,B, fi,T) and 
a cover U = {Ui . . .Um} of X, we have h^iU) < h~(U). As before, if the system is 
periodic, then, from the ergodicity, it must be a cyclic permutation on a finite set of 
atoms. Therefore h+(U) = h~(U) = 0. In the aperiodic case we can use the Strong 
Rohlin Lemma. 

Let e > 0. We shall show that h+(U) < h~(U) + 2e. From the definition of h~(U), we 
can find n eN and a partition (3 >z Uq -1 , such that < h~(U) + e. As (3 h Uq -1 , 

we can index the elements of /3, by sequences 2q _1 = «o • • • in-i, such that if B^-i, is one, 
then T^B^-i C t/j ; j = . . . n — 1. We can assume that each sequence, \ corresponds 
to, at most one element of (3, for otherwise, we could unite these elements and get a 
coarser partition f3', still refining Wq -1 , such that ^H^((3') < \H^((3) < h~(U) + e. Set 
T = {1 . . . M} n . So the elements of /3 are indexed by T. (if 7 e r,does not correspond to 
an element of f3, in the above way, we set B 1 — 0). In this way, the partition (3, defines a 
probability distribution, v>, on T, defined by 1/(7) = /i(-B 7 ) and we have that foo = ^Hft(P), 
is the "average entropy" (see Lemma 4.6) of v. 

Choose 5 > (in a manner specified later) and let F, be a base for a strong Rohlin tower 
(with respect to (3) of height n and error< 5 2 . Denote the atoms of (3\ F by _B 7 7 e T, 
(where 5 7 = 5 7 fl F), and define a partition <5 = {A± . . . A M } of E = \J^~ T^F, by 
A m = L){T^B in -i I j e {0...n-l}, ij = m}. Note that A m C C/ m . Extend a, to a 
partition a of X refining U, in some way. The set of indices of elements of a, A (the 
alphabet in which a-names are written) contains {1 . . . M} and we can always build a, 
such that |A| < 2M. We slightly abuse our notation and denote T = A™. In this way, v 
is still a probability distribution on T. 
Claim: If S, is small enough, then h^(a) < ho + e. 
Once we prove this claim, we are done, because then 

h + AU) < h^a) <h + e<h-(U) + 2e. 



16 



URI SHAPIRA 



Proof of claim: For k » n, we look at the function fk(x) = | Ylo 1 XE(T j x). We have 
that < f k < 1 and / f k > 1 - 8 2 . Therefore 

8 ■ »({x\l - f k (x) > 1 - 8}) < [ l-f k <[l-fk<S 2 

J{x\l-Mx)>l-S} J 

=>»({x\f k (.x)>l-6})>l-6. 
Denote, G\ = {x\fk(x) > 1 — 8}. For x G G\, there are at most 8k times < i < k — 1, 
such that T^a; Define 



il 



1 \ - 



G 2 = {x\ \t2_^Xa(Tx) - /m(A)\ <5,Aep\ F U {F}}. 
K o 

Let us see what can we say about the (a, [0, k — l])-name of an element, x, of G\ D G\. 
Fix such an x and denote by < ■ ■ ■ < i m -i, the times between to k — n in which x 
visits F. We have that < ij < k — n, ij + n — 1 < ij + \ (that is because the height of the 
tower is n). Except for at most 2n times (n at the beginning and n at the end), x visits 
E, exactly in the times ij . . . ij + n — 1, j — 1 . . . to — 1. Therefore, we must have 

n ■ m 2n N 
n ■ m > (1 - 5)k - 2n —— > 1 - (8 + — ) 

/c A; 

Denote the (a, [0, /c — l])-name of x by c<j = cJq -1 G A), and 7, = a;*. . . . Ui j+n -i G T, 
j = ... m — 1. We have that C = 7™" 1 ) is an (n, A;, 5 + ^)-packing for cj. Let us 

now see, what can we say about the distribution, Pc, this packing induces on T. 
For < r < k — n, we have that T r x G B y if and only if, there is a < j < to — 1, such 
that r = ij and 7 = 7j -. Therefore, because x £G\ 

• V 7 Gr ||#{0< j <m-l| 7 = 7j -}-Ai(S 7 )| <5. 
. |f -MF)|<5. 

Note that > so if <5 is sufficiently small, we can guarantee that | ^ — | would 
be arbitrarily small and in turn we can guarantee that for every 7 G T 

would be arbitrarily small. This is to say that \\Pq — v\ \ is arbitrarily small. We see that 
there is a positive function ip(8), independent of k, such that ip(8) — > as 8 — > and such 
that, if x G G\ fl G\ and a; is its (a, [0, A; — l])-name, then u is (n, fc, ^(5) + ^, v). 
Remember the function tp, from lemma 4.6. There is an rjo > 0, such that for every 
< i] < 770 (p(rj) < e. Choose k to be large enough so that ^ < y and the error, 5, 
of the tower to be so small, such that ip(8) < y, and conclude, from lemme 4.6, that 
the number of (a, [0, k — l])-names of elements of G\ fl G\ is at most 2 k< - ho+e \ From the 
ergodicity, we know that for large enough k, n{G\ fl G|) > 1 — 28, so we have 

hJa) = Um}-logN'(a^r 1 , 28) < h + e. 

as desired. 
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□ 

Remarks: 

• If (X, T), is totally ergodic, i.e (X, T n ), is ergodic for every n G N, then we 
can look at expressions like ^(W "~\T n ). It follows from the definition that 
h e ^(U,T) = ^^(Uq -1 , T n ). This enables us to prove the last theorem without 
any hard work done. We know from theorem 4.4, that h e ^(U,T) = h^(U,T) 
and therefore h+(U,T) = ^/i+(Wq -1 , T n ). But then, proposition 3.6 (which is 
elementary), gives: h~(U,T) = lim^h+iU^ 1 ,T n ) = h+(U,T) and this gives the 
desired result. 

• The definitions of h+(U), h~(U), were introduced in [R] and discussed also in [Ye], 
[HMRY]. There, a proof of their equality was given only in the case where (X, T), 
is a t.d.s, and U is an open cover. The proof was based on a reduction to a uniquely 
ergodic case and then a use of a variational inequality, proved in [GW]. 

• The definition of h^iU) is new. This definition helps us to prove directly a slight 
generalization of the variational inequality ,proved in [GW] and mentioned above, 
to the non-topological case. {Theorem 6.1). 

• The proofs of theorems 4.2, 4.4, 4.5 and lemma 4.6 are based on ideas of B.Weiss 
and E.Glasner 

5. Ergodic decomposition for h+,h~ 

5.1. Theorem. (Proposition 5 in [HMRY]): Let U = {U\ . . . Um}, be a cover of X, and 
(-<> — J f>xdfi(x), the ergodic decomposition of fi with respect to T. Then 

h+(U,T) = J hl(U,T)du(x) h~{U,T) = J h~ x (U,T)dfi(x) 

5.2. Corollary. h+(U) = h~{U) 

Proof. It follows immediately from the above and the ergodic case {Theorem 4.5) □ 

From now on we will denote the number h+(U, T) = h~(U, T)(= h e ^(U, T) in the ergodic 
case), simply by h^{U,T) or h^ilA) or h{U), when no ambiguity can occur. 

6. Variational relations 

As always, let U = {U\ . . . Um}, be a cover of the m.t.d.s (X,B, /i,T). We can define 
the " combinatorial entroptf of U as 

h c (U,T) = Umn-logAfiU^- 1 ) 
n 

where, A/"(V), is the minimum number of elements of V, needed to cover the whole space. 
Note that the sequence logHiJA^ is sub-additive, hence the limit exists. If (X, T) is a 
t.d.s and U is an open cover then we denote h top {U,T) = h c (U,T). 
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The next theorem was proved in [GW] for topological dynamical systems and mea- 
surable covers. We give here a simple proof for the non topological case that uses the 
definition of h e ^(U). 

6.1. Theorem. h^U) < h c (U). 

Proof. First, if the system is ergodic, then h^U) = UmHogMiU^ 1 , \) and as M{Uq -1 , \) < 
N(Uq~ 1 ), we have 

h^U) < Um-logAfiUr 1 ) = h top (U) 

as desired. In the non ergodic case, let /i — f /i x dfj,(x), be the ergodic decomposition of 
fi. By theorem 5.1, h^{U) = f hn x (l{)d[/,(x) , so from the first part we see that h^{U) < 
K{U). ' ' □ 

Remark: Another simple proof of the above, uses the definition of h~(U): 

HJW' 1 ) = inf HJa) < inf log\a\ < logNiW' 1 ) 

h^U) = lim-H^- 1 ) < Um-logN{U%- 1 ) = h c (U). 

From this stage, until the end of this paper we assume that (X, T), is a t.d.s. We denote 
by A4t(X), the set of T-invariant probability measures on X and by Ai^p(X), the set of 
ergodic ones. Also C° X) will denote the set of finite open covers of X. 

In [BGH], the following theorem was proved: 

6.2. Theorem. (Theorem 1 in [BGH]): IfU G C° x , then there exists /i G Mt{X), such 
that h^(U) > htopiU). 

In light of theorem 6.1 we have that for every U G C x , one can find a measure ji G 
Mt(X), such that h^U) = h top (U). In fact theorem 7 in [HMRY] now becomes: 

6.3. Corollary, for every U G C° x , one can find a measure /i G M.^{X), such that 

K{u) = h top {u). 

Proof. Choose /j, G Mt(X), such that h^(U) = h top (U), and let /j, — J fi x dfi(x), be its 
ergodic decomposition. We know that 

h top {U) = h^U) = J h, x (U)dfi(x) 

and that h^ x (U) < h top (U). So we must have h^iU) = h top (U) for [/j] a.e x. □ 

We conclude from the above, the classical variational principle: 
First we state a technical lemma, taken from [Ye]. 

6.4. Lemma. For any e > 0, \i G A4t{X) and a = {Ai . . .Am} G Vx, there exists an 
open cover U G C° x , such that for every partition (5 one has H^(a\P) < e. 

6.5. Theorem. (The Variational Principle) : 

(a) For every /j G M t (X), h^T) < h top (T). 
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0>) 8"P»eM<.(x) h u (T) = h top {T). 

Proof. To prove (a), we first show that for each /i G AAt(X), h^(T) = sup WeC ^ h^(U,T). 
If this is done, then from theorem 6.1, we get 

hp(T) < sup h top (U,T) = h top {T). 

uec° x 

It follows from the definition, that for any cover U of X, we have h^U, T) < h^(T), so one 
inequality is clear. For the other inequality, fix a partition, a = {Ai . . . Am}, of X and 
e > 0. We need to find an open cover, U, of X, such that h^a, T) < h^iU, T) + e. By the 
preceding lemma and from the fact that for any f3 G Vx one has h^(a) < h^((3) + H(a\f3) 
we have U G C x , such that 

h,(U, T) = inf h,(P, T) > inf (/i>, T) - H,{a\(5)) > h^a, T) - e. 

To prove (b), note that from (6.3) we know that for any U G C x , we can find \i G A4^(X), 
such that h^iU^T) = h top (U,T). This gives us 

sup h^T) > h top (U,T) =>- sup h^T) > sup h top (U, T) = h top {T). 

neM e T (x) fieM e T (x) uec° x 

Together with (a), we get equality, which is (b). □ 
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